Introduction to 
Protein Structure 



Second Edition 



Carl Branden 

Microbiology and Tumor Biology Center 

Karolinska Institute 

Stockholm 

Sweden 

John Tooze 

Imperial Cancer Research Fund Laboratories 

Lmcolns Inn Fields 

London 

UK 



/ 




n?^H 16 ,S Sti " Sl ° W COmpared the of accumulatton of 
amino acid sequence data. This makes the folding problem the successful 
Paction of a protein's tertiary structure from its fnSno a^'sequen e cen^ 
tral to rapid progress in post-genomic biology. We will, therefore in this 

SrST ^ efly d f cribe implications of pStein ^Z^Z^ 

for the prediction of secondary and tertiary structure before giving so™ 
examples of protein engineering and protein design. 

Homologous proteins have similar structure and 
junction 

The term homology as used in a biological context is defined as similarity of 
Phy " 01 ^ ^velopment and evolution of organisms base^on 

C ^ Statement that t "° P«**» « homologous 

therefore implies that their genes have evolved from a common ancSai 

simlSriH pr ° teinS 316 m ° StlV iec °S^ by statistically significant 

similarities In their amino acid sequences. Usually, they also havfsimTlar 
^nctions although there are some known exception! where gene^o 
anaent enzymes have been recruited at a later stage in evolution to prolce 
proteins with quite different functions. An example is provided by oTS 
^crural components in the eye lens that Is homologous to meLdent gTy! 
SSn^TT lac ^ te f d *ydro g enase. Once a novel gene has been cloned 2d 
'^J 01 ***™ sequence similarity between the corre 
Sm2?X P H r0tem K and other P">"in fences should be made. SZ- 
ly. this is done by comparison with databases of known protein sequences 
using one of the standard sequence alignment computer programs 

Two proteins are considered to be homologous when they have identical 

^ZSL'T™ n 3 Signi6Cant genual posLns Xng t? 

po ypeptrde chains. Usmg statistical methods based on comparisons of com- 
puter-generated random sequences, it is relatively straightforward to aSSs 
howmany positions need to be identical for a statistically significanVidS! 
ty between two sequences. However, itMJm^Jo^^tt^pw^s 
withjequence identity below the level of staHstical slgriiflca^celia^sinmar 

fff-^s^^ residues forn ; 

patterns or motifs that onbe^edtoTdentify other proteins that bS? to 

nlTL f 7Z°? al ^ FreqUent,y ' members * ™* fammes 
SrT^^T 95 ^'""' eVC " ** identities »* «* statistically 

-SSK, y - ft P ctMna y significant. Ditaba^ es tor such families, based 
?1 ™ ^oT 1 ^ SeqUenCe m0tifs ' aValIable on *»* W orld Wide Web 
proteta. 394) and ^ are verv ^e*" 1 for assigning function to a novel 

. ^ggj^amar nino acid sequence identity i s jound wit h a protein of 
>Sowngys^l structure , a thre e.djm ensional models the novel p r^TTr, 
^CMisti^ using computerlr^eling, on the basis of the sequence aUgn- 
ment and the known three-dimensional structure. This model can then serve 
as an excellent basis for identifying amino acid residues involved in the 

ITJ « ° f J" an , tlgeniC epit0pe$ ' and tne model can be used for protein 
engineering, drug design, or immunological studies 

rentfy^rnnn I?*"" ™ ,arge and exponentially, cur- 

SS^f 8 m ° re **" 50 °' 000 to""™ protein fences, *e standard 
S^^f^T P i°f ^ haW 56611 Signed to provide a compr.mU. 
benveen the speed a nd th e accu racy of the search . As a rr, ,,^ ^ Z T y , ve ll 

S^nS^EE" 3 reasonablv hi « h degree of sequence identity, usually of 

Sat set A fo^ 0 H MUCh m ° re SCnSltiVe pr0 8 Eams have been written 
that search for both identity and conserved structural properties and also for 
relatedness in different physical properties, but these inevitably require te 

a^cXS,"*, tt T- USed ' SUCh Pr °8 rams can id entuy structural 
and functional similanty where the standard programs fail to do so 



Homologous proteins always contain a core region where the general folds of 

Sa'rvSrucn?^ "* ™* °°« "*» 

secondary structure elements that build up the interior of the protein- in 

s^TnSur,: ° f h ° molo 8° us P™«*» h-e *nilar thnSS en 

hav u ,? dlStantly related P roteins low sequence identity 

tS^tf*™ 3lthOUgh min0r occur in S2 

positions of the secondary structure elements to accommodate differences" 

£^£SST , * 0M,e hydr °P hobic side cha ™ in the interior oUhZZ 
5!' ]? £ ,l hS $equence identitv < ^ more c 'osely elated are the sS 
fold structures (Figure 17.1). This has important implications fo remodel 
bidding of homologous proteins; the more distantly related Z proteins «e 
the more the scaffold must be adjusted to model the new stST ' 

sidertoTv bXn'f "2™"? thC buUding blocks of scaffold * can vary con- 
siderab y both m length and in structure. The problem of predicting th» 
Oiree-dlmensional structure of a protein that is homologous to a Sn of 
known mree-dimensional structure is therefore mainly a Son oTprSrt 

fo£ ha ^ P regi ° nS ^ * tecnata conforLanontXr ES 
™£ "fT* ad|USted - menaoned ^ Chapter 2, loop regions do not nTe 
random structures, and their main-chain conformations diSer in sets of 
smidar structures The conformation of each set deplore on me num 
b« of am.no acids in ^ loop and ^ Qf ^ struct ^^ Um 

£eac\^ 

factual am mo acid seq uencesjierefore it is possible to useTd^tabisTof 
loop repons from^rotelns of known structure ^obtain a^retumn^e 

ma^JhT, 0 ' 3 " Ur f OWn StIUCture - To model a Protem stnrcture^uTtable 
^"^J. TOP . COnf0nnati0ns from tnis databa « «« attached to the scaf- 

P^ISiv 0 ^ 6 $imUar t0 that ° f the ^ homolog^ 

UK c ~2 n *?IE a ttonsj>fti^^ 
reflnement of the model -wTShlm^^ 

maximizing the interaction energies of the amino acids AnalysHf s^ 
tun* determined to high resolution has shown that o^yTfew1i2-cSm' 

^o^tZ^ 11 ^ ^ * 11,686 « Called "tamercana Z^M- 
ing of side chams employs databases of such rotamers. 

TlZTfl * u 8 Slt6S m "^"noglobulins. These binding sites are built 
«P from th^hyjper^^ from ^ ^JJJ 

de^,° f J?* Ught ^ ^ ^avylhalnT^n^g^^ 

sSl^oT^fr if" J** " ""^ "* Se ^ Uence identity g witnTtiS 
™ of *« variable domains in different immunoglobulin molecules 
Con*quently, the scaffold^nab^^ 

structures^an,be^dj^ monoclonal antibody ST a 

tTu^l^^ nd ^ irthe CDR^egi onsofa ny^odv 
ESS7i ^, rCnt m ^ "^g -fro 1 " those of^ny other known anti . 
body, and theu threeKlmemionaT^STres must be predicted. By comparing 



3.0 
I2.4-- 



Hgnw 17.1 The relation between the divergence of amino add I 

sequence and three-dimensional structure of the core region of * 

homologous proteins. Known structures of 32 pain of homologous § 

proteins such as globins, serine proteinases, and immunoglobulin I 

domains have been compared. The root mean square deviation of the 5 

an exZS i i° tS) „ The ^ represents tee best fit of dots to ■§ 

fdpn««n 1 ^If* 10 "- PaiK W 8 h homology are almost <§ 

So n ? T • dimenslonal struct ™*- whereas deviatlonfm atomic 0 n 

positions for pairs of low homology are of the order of 2 A 0 °,no 

(From C. Chothia and A. Lesk, EMBO J. S: 823-826,, 1986.) 



fi 1.8 ■ 
1.2- 
■S 0.6- • 



Predicted 






Experimental 



known antibody structures and sequences, it has been shown that there is 
only a small repertoire of main-chain conformations for at least five of the 
six CDR regions and that the particular conformation adopted is determined 
by a few key conserved residues for each loop conformation. For example 
three different conformations were found for the CDR3 regions of the light 
chains in nine known x-ray structures. More than 90% of the known 
sequences of light-chain CDR3 regions obey the sequence constraints of one or 
other of these three conformations. By using this repertoire of loop conforma- 
tions, considerable success has been achieved in correctly predicting the struc- 
ture of antigen-binding surfaces. An example of such a prediction compared 
with the actual structure, subsequently determined, is given in Figure 17.2. 

Knowledge of secondary structure is necessary for prediction 
of tertiary structure 

What can be done by predictive methods if the sequence search fails to reveal 
any homology with a protein of known tertiary structure? Is it possible to 
model a tertiary structure from the amino acid sequence alone? T here are no 
gghods_avai^ do this and obtain a model detailed enough to be 

of any use, for example, in dru g design anl[igtgr^ iTl^T 
ever, a very active area of research and quite promising results are being 
obtained; in some cases it is possible to predict correctly the type of protein, 
a, P, or a/p, and even to derive approximations to the correct fold. 

TodayVpredictive methods rely^prediction of secon dary structure: in 
other words, which amino acid r esidues are q-fielical and which are in p 
strand^ We have emphasi^dlrrCTia^^ th at secondaryltrmure cannot' 
jn general be pr edicted with a hi ghde gree of confidence wit h the possibl e 
exceptaons of transm e mbrane helices ancUf-helieal r^ wiToiic Thic T^p^ 
a basic limitation on the prediction of tertiary structure. Once the correct 
secondary structure is known, we know enough about the rules for packing 
elements of secondary structure against each other (see Chapter 2 for helix 
packing) to derive a very limited number of possible stable globular folds 
Consequently, second^sfructure prediction lies at the heart of the predic- 
tion of tertiary stnicturefronrtrre-amTno-^ i - sequence : 



Hgure 17.2 An example of prediction of 
the conformations of three CDR regions of a 
monoclonal antibody (top row) compared with 
the unrefined x-ray structure (bottom row). LI 
and L2 are CDR regions of the light chain, and 
HI is from the heavy chain. The amino acid 
sequences of the loop regions were modeled by 
comparison with the sequences of loop regions 
selected from a database of known antibody 
structures. The three-dimensional structure of 
two of the loop regions, LI and L2, were in 
good agreement with the preliminary x-ray 
structure, whereas HI was not. However, 
during later refinement of the x-ray structure 
errors were found in the conformations of HI, 
and in the refined x-ray structure this loop 
was found to agree with the predicted 
conformations. In fact, all six loop 
conformations were correctly predicted in 
this case. (From C. Chothia £t al., Science 
233: 755-758, 1986.) 



350 



